Goal here is to validate the initial clustering and UMAP and ensure things like batch effect and clustering reflect the underlying biology.
#+ message = FALSE, warning = FALSE
suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(EnsDb.Hsapiens.v86))
suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
suppressPackageStartupMessages(library(Matrix))
suppressPackageStartupMessages(library(SeuratWrappers))
suppressPackageStartupMessages(library(harmony))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
setwd("~/gibbs/DOGMAMORPH/Ranalysis")
results<-readRDS( "Objects/20230518Completeobj.rds")
First I am going to check there is no library or individual driven batch effect across the UMAP by visualizing on those. I’ll also visualize the treatment and timepoint to see if there are differences there that we’re picking up.
DimPlot(results, label = TRUE, reduction = "umap")
## Loading required package: Signac
DimPlot(results, group.by="orig.ident", reduction = "umap")
DimPlot(results, group.by = "Participant", reduction = "umap")
DimPlot(results, group.by = "Treatment", reduction = "umap")
DimPlot(results, group.by = "Timepoint", reduction = "umap")
DimPlot(results, group.by = "Cell.Source", reduction = "umap")
Notes: Clustering - Overall looks good, appears that the substructure of outlying populations is reasonably separated by the clustering algorithm, although the mass of presumably CD4+ T cells are potentially overclustered as we can see several overlapping and presumably small clusters in that mass. Library - It appears there are no clusters driven entirely by library. There are differences in distribution but I suspect those are primarily due to PBMC vs CD4 libraries. Participant - There is clearly some variation, in participant that is not entirely corrected for (ex there is a patch of green that may be 5018 or something else along the bottom left of the large cluster). Overall this may be acceptbale variation, but after annotation I should check that there are not too much participant - to - participant variability in cluster membership Treatment - Minor treatment driven differences but overall appears well mixed Time point - Appears to be some outlying populations that are potentially perturbed, but it’s overall looking well mixed. Cell source - looks like that main cluster is CD4, with a sub cluster pulled out to the left and to the bottom. CD4 also participates in other clusters, likely due to low levels of contamination.
Also checking here to see if there are any QC related factors that might be driving some of the cluster distribution:
FeaturePlot(results, "mt", reduction = "umap")
FeaturePlot(results, "nCount_RNA", reduction = "umap")
FeaturePlot(results,"nFeature_RNA", reduction = "umap")
FeaturePlot(results,"nCount_ATAC", reduction = "umap")
FeaturePlot(results,"nFeature_ATAC", reduction = "umap")